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Abstract. In this contribution, we consider the problem of the 
blind separation of noisy instantaneously mixed images. The im- 
ages are modelized by hidden Markov fields with unknown param- 
eters. Given the observed images, we give a Bayesian formulation 
and we propose to solve the resulting data augmentation problem 
by implementing a Monte Carlo Markov Cham (MCMC) proce- 
dure. We separate the unknown variables into two categories: 

1. The parameters of interest which are the mixing matrix, the 
noise covariance and the parameters of the sources distributions. 

2. The hidden variables which are the unobserved sources and the 
unobserved pixels classification labels. 

The proposed algorithm provides in the stationary regime sam- 
ples drawn from the posterior distributions of all the variables in- 
volved in the problem leading to a flexibility in the cost function 
choice. 

We discuss and characterize some problems of non identiflability 
and degeneracies of the parameters likelihood and the behavior of 
the MCMC algorithm in this case. 

Finally, we show the results for both synthetic and real data to 
illustrate the feasibility of the proposed solution. 

I. INTRODUCTION AND MODEL ASSUMPTIONS 

The observations are m images (X'Jti.m, each image X 1 is defined on a 
finite set of sites, S, corresponding to the pixels of the image: X 1 = (x % r ) r &S- 
The observations are noisy linear instantaneous mixture of n source images 
(S J )j—i., n defined on the same set S: 



n 




where A = (ay) is the unknown mixing matrix, N l = (n*) re s is a zero- 
mean white Gaussian noise with variance cr e |. At each site r G 5, the matrix 
notation is: 

a; = 4s+n (1) 

The noise and source components (N l )i.. m and (S^)j=i.. n are supposed to 
be independent. Each source is modelized by a double stochastic process 
(S^, Z J ). Si is a field of values in a continuous set 1Z and represents the 
real observed image in the absence of noise and mixing deformation. Z J 
is the hidden Markov field representing the unobserved pixels classification 
whose components are in a discrete set, Z° r 6 {l..K J }. The joint probability 
distribution of Z 3 satisfies the following properties, 

vz*, p M (4\z^ {r} ) = p M (4\z j N{r) ) 

VZ>, P M (Zi)>0 

where ^s\{r} denotes the field restricted to <S\{r} = {! 6 5, 1 ^ r} and N(r) 
denotes the set of neighbors of r, according to the neighborhood system 
defined on <S for each source component. According to the Hammersley- 
Clifford theorem, there is an equivalence between a Markov random field and 
a Gibbs distribution, 

Pm(Z') = [W^)]- 1 exp{-H aj (Z j )} 

where H aj is the energy function and ay is a parameter weighting the spatial 
dependencies supposed to be known. Conditionally to the hidden discrete 
field Z^ , the source pixels S 3 r , r £ S are supposed to be independent and 
have the following conditional distribution: 

p(S j \Z j , V j ) = ~[[p r (4\zi,r?) 

where the positive conditional distributions depend on the parameter r\> 6 
lZ d . We assume in the following that p r (. \ z) is a Gaussian distribution with 
parameters r\> = (n jz , o] z ) z =\.,k- 

We note that we have a two-level inversion problem: 

1. The problem described by (1) when the mixing matrix A is unkown is 
the source separation problem [1, 2, 3]. 

2. Given the source component S j , the estimation of the parameter rf 
and the recovering of the hidden classification Z J is known as the un- 
supervised segmentation [4]. 

In this contribution, given the observations X l ,i = l..m, we propose a so- 
lution to jointly separate the n unknown sources and perform their unsu- 
pervised segmentations. In section II, we give a Bayesian formulation of the 



problem. In section III, an MCMC algorithm based on the data augmen- 
tation modelization is proposed. In section IV, we focus on the problem of 
the non identifiability and the degeneracies occurring in the source separa- 
tion problem and their effects on the MCMC implementation. In section V, 
numerical simulations are shown to illustrate the feasibility of the solution. 

II. BAYESIAN FORMULATION 

Given the observed data X = (X 1 , ...,X m ), our objective is the estimation 
of the mixing matrix A, the noise covariance R e = diag(a e l, er^), the 
means and variances {\ij Z ,v] z )i=\..n,z=\..K of the conditional Gaussians of 
the prior distribution of the sources. The a posteriori distribution of the 
whole parameter = (A, R e , fij Z ,o-j z ) contains all the information that we 
can extract from the data. According to the Bayesian rule, we have 



In the section III, we will discuss the attribution of appropriate prior distri- 
bution p{0). Concerning the likelihood, it has the following expression, 



where N denotes the Gaussian distribution, x r the (m x 1) vector of observa- 
tions on the site r, z r is the vector label, /x z = [[ii zi> an d R Zr the 
diagonal matrix diag[a\ Zi , ...,a^ z ]. We note that the expression (2) hasn't 
a tractable form with respect to the parameter because of the integration 
of the hidden variables S and Z . This remark leads us to consider the data 
augmentation algorithm [5] where we complete the observations X by the 
hidden variables (Z, S), the complete data are then (X 7 S, Z). In a previous 
work [6], we implemented restoration maximization algorithms in the one 
dimensional case to estimate the maximum a posteriori estimate of 6. We 
extend this work in two directions: (i) first, the sources are two-dimensional 
signals, (ii) second, we implement an MCMC algorithm to obtain samples of 
6 drawn from its a posteriori distribution. This gives the possibility of not 
being restricted to estimate the parameter by its maximum a posteriori, we 
can consider another cost function and compute the corresponding estimate. 

III. MCMC IMPLEMENTATION 



p(p\X)<x P (X\0)p{0) 





(2) 



We divide the vector of unknown variables into two sub-vectors: The hidden 
variables (Z, S) and the parameter 9 and we consider a Gibbs sampler: 



repeat until convergence, 

1. draw ~ p(Z , S \ X ,0 {k ~ 1} ) 

2. draw W ~ p(0 \ X, ZW, £(*>) 

This Bayesian sampling [7] produces a Markov chain (0 ), ergodic with 
stationary distribution p(0 | X). After ko iterations (warming up), the sam- 
ples (0 ° + ) can be considered to be drawn approximately from their a 
posteriori distribution p(6 | X). Then, by the ergodic theorem, we can ap- 
proximate a posteriori expectations by empirical expectations: 

E[h(0)\X] ( 3 ) 

k=\ 

Sampling (Z, S): The sampling of the hidden fields (Z, S) from p(Z, S\X,0) 
is obtained by, 

1. draw Z from 

P (Z\X,0)(xp(X\Z,0)Pm(Z) 

In this expression, we have two kinds of dependencies: (i) Z are inde- 
pendent across components, p{Z) = W!j—iP{Z^) but each discrete im- 
age Z^ ~ Pm(Z 3 ) has a Markovian structure, (ii) Given Z, the fields 
X are independent through the set S, p(X Z, 0) = Y\ reS p{x r \z r ,0) 
but dependent through the components because of the mixing operation 
p(x r \z r ,9) = N{x r ; Afj, z 7 AR Zr A* + R € ). 

2. draw S \ Z from 

p(S \X,Z,0) = Y[ J\f(s r ; m a / ost , V r apost ) 
res 

where the a posteriori mean and covariance are easily computed [8], 

y apost = [A*R~ 1 A + R-] ] " 1 

m a P ost = V r apost (A*R- 1 x r + R- i 1 n Zr ) 

Sampling 0: Given the observations X and the samples (Z,S), the sam- 
pling of the parameter becomes an easy task (this represents the princi- 
pal reason of introducing the hidden sources). The conditional distribution 
p(0 | X, Z, S) is factorized into two conditional distributions, 

p(0 | X, Z, S) oc p(A, R t | X, S)p(fj., tr\S,Z) 

leading to a separate sampling of (A,R e ) and (fx, cr). The choice of the a 
priori distributions is not an easy task [9]. The complete likelihood of (A, R e ) 



belongs to the location scale family [10] and applying the Jeffrey's rule we 
have, 

p(A, R e ) oc \^(R e )\i = \R e \^ t11 

where p(A) is locally uniform and T is the Fisher information matrix. We 
obtain an inverse Wishart distribution for the noise covariance and a Gaussian 
distribution for the mixing matrix, 

r R- x ~ Wi m {a e ,/3 e ), a e = p e = ^{R xx - R .njRj 

( p(A | R e ) ~ M(fj, a ,R a ),fj, a = veciRxsR- 1 ), R a = r|y Rj^ ® Re 

where we define the empirical statistics R xx = jj^ x rX*, R xs = j^y J2 r x r s * 
and R ss = j$]J2r s r s *- We note that the covariance matrix of A is pro- 
portional to the noise to signal ratio. This explains the fact noted in [11] 
concerning the slow convergence of the EM algorithm. For the parameters 
(/x, er), we choose conjugate priors [7]. The reason of this choice is the elim- 
ination of degeneracies occurring when estimating the variances <jj Z , This 
point is elucidated in section IV. The a posteriori distribution remains in the 
same family as the likelihood function, Gaussian for the means and Inverse 
Gamma for the variances. The expressions are the same as in [7]. 



IV. IDENTIFIABILITY AND DEGENERACIES 

It is well known that in the source separation problem there exist scale and 
permutation indeterminations. This can be seen when multiplying the matrix 
A by a scale permutation matrix AP and the sources by P T A~ 1 . The permu- 
tation indetermination doesn't degrade the performance of the algorithm. In 
fact, in image processing the size of data |<S| is sufficiently large to avoid the 
Markov chain (A^) produced by the algorithm permuting its columns, the 
probability that the Markov chain changes the a posteriori mode is very low. 
However, the scale indetermination must be eliminated. In practice, after 
each iteration of the MCMC algorithm, the columns of A are normalized. 

There is another kind of indetermination which is the transfer of variances 
between the covariances R z and the noise covariance R e : 

P (X | A, R e + eAAA*,R z - eA, fi z ) = p(X \ A, R e ,R Zlt i z ), e = ±1 

When we study the particular case of diagonal noise covariance and the mix- 
ing matrix A is unitary (A A* = I), we note that an obvious transfer of 
variances occurs when A = al. A retained solution in this paper is the pe- 
nalization of the likelihood by a prior on the variances which eliminates this 
variance transfer. This solution is more robust than the fact of fixing either 
R z or i? e - However, a simultaneous penalization of noise and signal variance 
can induce a transfer between modes. In such situations, the Markov chains 

(R e ) and (R z ) seem to converge to a stationary distribution even after 



a great number of iterations but suddenly a transfer occurs (see the section 
V for numerical illustration). This indetermination is noted in [11] and was 
used to accelerate the convergence of the EM algorithm by forcing the noise 
covariance to be maximized. In the MCMC algorithm, we note that the a 
posteriori covariance of A is R a = r^ji?" 1 (g> R e . Consequently, as the sig- 
nal to noise ratio increases, covariance decreases and the sample A is more 
concentrated on its mean value. 

It is obvious, under the form 2, that degeneracy happens when one of the 
terms constituting the sum approaches to infinity and this is independent of 
the law Pm- 

Consider now the matrices T z = AR Z A* + R e . It's clear that degeneracy 
is produced when, among matrices T z , at least one is singular and one is 
regular. We show in the following that this situation can occur. 

We recall that the matrices R z and R c belong to a closed subset of the set 
of the non negative definite matrices. Constraining matrices to be positive 
definite leads to complicated solutions. The main origin of this complication 
is the fact that the set of positive definite matrices is not closed. For the 
same reason, we don't constrain the mixing matrix A to be of full rank. 

Proposition 1: V A non null, 3 matrices {T z = AR Z A* + R c for z — 
1..K} such that {z \ F z is singular} ^ and {z \ T z is regular} ^ 0. 
R e is necessarily a singular NND matrix and Card({z \ R z is regular}) < 
K. 

For a detailed proof see [12] 

One possible way to eliminate this degeneracy consists in penalizing the 
likelihood by an Inverse Wishart a priori for covariance matrices. In fact, we 
know that the origin of degeneracy is that the covariance matrices R z and 
R e approach the boundary of singularity (in a non arbitrary way). Thus, if 
we penalize the likelihood such that when one of the covariance matrices ap- 
proaches the boundary, the a posteriori distribution goes to zero, eliminating 
the infinity value at the boundary and even forcing it to zero. 

Proposition 2: V X E (R m ) 151 , the likelihood p(X \ 0) penalized by an 
a priori Inverse Wishart for the noise covariance matrix R e or by an a priori 
Inverse Wishart for the matrices R z is bounded and goes to when one of 
the covariance matrices approaches the boundary of singularity. 

For a detailed proof see [12]. 



V. SIMULATION RESULTS 



To illustrate the feasibility of the algorithm, we generate two discrete fields 
of 64 x 64 pixels from the Ising model, 

P M {Z 3 ) = [VK(a 3 )] _1 exp{a^/ Zr=Zs },ai = 2, a 2 = 0.8 



ai > a.2 implies that the first source is more homogeneous than the second 
source. Conditionally to Z, the continuous sources are generated from Gaus- 



sian distributions of means 



and variances cr, 



The sources are then mixed with the matrix A = 



and a 



-2 2 
-3 3 j 

" 0.85 0.44 
u 0.51 0.89 

white Gaussian noise with covariance of / (of = 5) is added. The signal to 
noise ratio is 1 to 3 dB. Figure- 1 shows the mixed signals obtained on the 
detectors. 
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X2 
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figure- 1. The noisy mixed images X 1 and X z 



We apply the MCMC algorithm described in section III to obtain the 
Markov chains A^ k \ B.i k \ [i^J and <r| z . Figures 2 and 3 show the his- 
tograms of the element samples of A and their empirical expectations (3). 
We note the concentration of the histograms representing approximately the 
marginal distributions around the true values and the convergence of the em- 
pirical expectations after about 1000 iterations. Figures 4 and 5 show the 
convergence of the empirical expectations. We note that the convergence of 
the variances is slower that the mixing elements and the means. Figure 6 

" 0.89 -0.44 " 
0.44 0.89 

tary. We note that this transfer occurred after a great number of iterations 
(80000 iterations) and that the sum of the variances remains constant. 



shows the transfer of variances when the matrix A 



is um- 
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Figure-2 The histograms of the 
samples of mixing elements a%j 



Figure-3 Convergence of the empirical 
expectations of atj after 1000 iterations 
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Figure-4 Convergence of the empirical 
expectations of the means rriij 



Figure-5 Convergence of the empirical 
expectations of the variance afj 




We test our algorithm on real data. The first source is a satellite image 
of an earth region and the second source represents the clouds (First column 
of figure 7). The mixed images are shown in the second column of figure 7. 
The results of the algorithm are illustrated in the third column of figure 7 
where the sources are successfully separated. The figure 8 illustrate the joint 
segmentation of the sources. We note that the results of the two segmen- 
tations are the same as the results which can be obtained if we apply the 
segmentation on the original sources. 




(a) (b) (c) 

Figure. 7: (a) Original sources, (b) Mixed sources and (c) Estimated sources 




Figure 8: Segmented images 



VI. CONCLUSION 

In this contribution, we propose an MCMC algorithm to jointly estimate the mixing 
matrix and the parameters of the hidden Markov fields. The problem has an inter- 
esting natural hidden variable structure leading to a two-level data augmentation 
procedure. The observed images are embedded in a wider space composed of the 



observed images, the original unknown images and hidden discrete fields modeliz- 
ing a second attribute of the images and allowing to take into account a Markovian 
structure. The problems of identifiability and degeneracies are mentioned and dis- 
cussed. In this work the number of sources and the number of the discrete values of 
the hidden Markov field are assumed to be known. However, the implementation of 
the algorithm could be extended to involve the reversible jump procedure on which 
we are working. 
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